In this post I want to walk through a number of SQL Server corruption recovery techniques for when you’re out of luck, have no backups, and the usual methods don’t work. I’ll be using the AdventureWorksLT2008R2 sample database as my victim.
Sometimes you must first do evil, to do good. Such is the case when you want to hone your skills in corruption recovery of SQL Server databases.
When I initially started working on OrcaMDF I had just one goal, to gain a deeper knowledge of MDF file internals than I could through most books available.
I’m slightly late to announce this, but better late than never!
At the moment I’m working on extending OrcaMDF Studio to not only list base tables, DMVs and tables, but also stored procedures. That’s easy enough, we just need to query sys.procedures – or that is, the sys.sysschobjs base table, since the sys.procedures DMV isn’t available when SQL Server isn’t running.
I’m really lagging behind on my blogging – life is busy as the moment! Just a couple of weeks ago I presented my Revealing the Magic session at the Norwegian Developers Conference in Oslo. I was quite excited to give my SQL Server oriented session to a crowd of developers – a 500 level session, at SQL Server events that is.
While implementing compression support for OrcaMDF, it stumped me when I discovered that integers (including datetime, money and all other type based on integers) were stored in big endian. As all other integers are stored in little endian and I couldn’t see why they’d want to change it, I assumed it must’ve been due to using different libraries or something to that extent.
While working on row compression support for OrcaMDF, I ran into some challenges when trying to parse integers. Contrary to normal non-compressed integer storage, these are all variable width – meaning an integer with a value of 50 will only take up one byte, instead of the usual four. That wasn’t new though, seeing as vardecimals are also stored as variable width. What is different however is the way the numbers are stored on disk. Note that while I was only implementing row compression, the integer compression used in page compression is exactly the same, so this goes for both types of compression.
In this post I’ll do a deep dive into how vardecimals are stored on disk. For a general introduction to what they are and how/when to use them, see this post.
Determining whether vardecimal.aspx) is enabled for a given table is usually done by using the OBJECTPROPERTY function like so:
LOB types like varchar(MAX), nvarchar(MAX), varbinary(MAX) and xml suffer from split personality disorder. SQL Server may store values in-row or off-row depending on the size of the value, the available space in the record and the table settings. Because of this, it’s no easy task to predict the size of the pointer left in the record itself. You might even say… It depends.
While adding some extra sparse column tests to the OrcaMDF test suite, I discovered an bug in my parsing of records. While the problem was simple enough, it took me a while to debug. Running the test, it worked about 40% of the time while failing the remaining 60% of the time. As I hadn’t picked up on this pattern I happily fixed (or so I thought) the bug, ran my test and verified that it was working. Shortly after the test failed – without me having changed any code. After having the first few strains of hair turn grey, I noticed the pattern and subsequently fixed the bug.
Warning: this is a select is (most likely) not broken, it’s just not working as I’d expect. It may very well be that I’m just overlooking something, in which case I hope someone will correct me :)
While implementing sparse column support for OrcaMDF, I ran into a special condition that caught me by surprise – a data record with no null bitmap. Even Paul Randal mentioned that the null bitmap would always be present in data records in his A SQL Server DBA myth a day: (6/30) three null bitmap myths-three-null-bitmap-myths.aspx) post.
In this post I’ll be looking at the internal storage mechanism that supports sparse columns. For an introduction to what sparse columns are and when they ought to be used, take a look here.
For an introduction to the anatomy of records, I suggest you read this post by Paul Randal.
Based on my findings exploring the sys.system_internals_partition_columns.ti field, I needed parser that could extract the scale, precision, max_length as well as the max_inrow_length fields from it. The tricky part is that those values are stored differently for each individual type, added onto the fact that some types have hardcoded defaults that are not stored in the ti field, even though there’s space for it.
Running sp_helptext on the sys.system_internals_partition_columns system view reveals the following internal query:
Lately I’ve been working on nonclustered index parsing. One of my test cases proved to be somewhat more tricky than I’d anticipated, namely the parsing of nonclustered indexes for non-unique clustered tables. Working with non-unique clustered indexes, we’ll have to take care of uniquifiers when necessary.
Earlier today I was doing some ad-hoc querying to retrieve some numbers for the month of May. Not giving it deeper thought, I made a simple query like this:
Earlier this week I provided some details on the forwarding stub that’s left behind when a heap record is forwarded. In this post I’ll look at the second part of a forwarded record – the actual record to which the forwarding stub points.
A forwarded record occurs whenever a record in a heap increases in size and it no longer fits on the page. Instead of causing a page split, as would happen had the table not been a heap, the record is moved onto another with enough free space, or onto a newly allocated page. Forwarded records can wreak havoc to your performance due to fragmentation, but I’ll leave not cover that here as many other more skilled people have already done so.
I recently stumbled upon a question on Stack Overflow on how best to reduce their data size as it’s growing out of hand. As the original author hasn’t replied back yet (as of writing this post, I’m making some assumptions on the scenario – so take it as an abstract scenario). The basic scenario is that they have a number of measuring stations, each one of those containing a lot of equipment reporting back to a SQL Server in a schema like the following abstract:
One of the first challenges I faced when starting out the development of OrcaMDF was parsing page headers. We all know that pages are basically split in two parts, the 96 byte header and the 8096 byte body of remaining bytes. Much has been written about headers and Paul Randal (b|t) has a great post describing the contents of the header as well. However, though the contents have been described, I’ve been completely unable to find any kind of details on the storage format. What data types are the individual fields, and what’s the order? Oh well, we’ve always got DBCC PAGE.
Bits are stored very differently from other fixed length data types in SQL Server. Usually all fixed length columns will be present, one after the other, in the fixed data part of a record. As the smallest unit of data we can write to disk is a byte, the naïve approach to storing bits would be to use a whole bit for each bit. It would be very simple to parse as it would follow the usual scheme, but it would also waste quite some space.
There are several different date related data types in SQL Server. Currently OrcaMDF supports the three most common types: date, datetime & smalldatetime.
Implementing parsing support for SQL Server data types in OrcaMDF is a simple matter of implementing the ISqlType interface:
I’ve been spamming Twitter the last couple of days with progress on my pet project, OrcaMDF. But what is OrcaMDF really?
Miracle Open World 2011
I was invited to speak at MOW2011 for the SQL Server track. Last year I got good reviews for my presentation on Dissecting PDF Documents, a deep dive into the file format of PDF files. Wanting to stay in the same grove, I decided to take a look at the MDF format as it’s somewhat closer to SQL Server DBA’s than PDF files. Having almost worn my SQL Server 2008 Internals book down from reading, I’ve always been interested in the internals, though I still felt like I was lacking a lot of knowledge.
A parser is born
For my demos at MOW I wanted to at least read the records from a data page, just like the output from DBCC Page. The basic page format is well documented, and it’s not the first time I’ve taken a deeper look at pages. Surprisingly quickly, I had record parsing from data pages functioning using a hardcoded schema. Parsing a single page is fun, but really, I’d like to get all the data from a table. Restricting myself to just consider clustered tables made it simpler as it’d just be a matter of following the linked list of pages from start to end. However, that meant I’d have to parse the header as well. There’s some good information out there on the anatomy of pages, but everything I could find had a distinct lack of information on the actual header structure and field types.
I often like to spend my weekends perusing the sys.system_internals_allocation_units table, looking for the remnants of Frodo and his crew. In the sys.system_internals_allocation_units there are several references to relevant pages:
SQL Server stores all of it’s data in what is basically a large array of “pages”. Each page is exactly 8kb and can contain various kinds of data, depending on the page type. In this entry we’ll explore how to decipher a data page.